Single Cell Application¶

Instructors:
Zongxu Zhang (zongxu.zhang@stu.pku.edu.cn)
Zexian Zeng (zexianzeng@pku.edu.cn)

Affiliation:
Center for Quantitative Biology
Academy for Advanced Interdisciplinary Studies
Peking University

Lab Website:
🔗 http://cqb.pku.edu.cn/zenglab/

This materials is largely based on Pancreatic endocrinogenesis differential geometry, we thank the origional authors for their contribution.

Review of Last Week’s Topics¶

  1. Basic scRNA-seq Mapping Process
    Overview of the standard single-cell RNA-sequencing (scRNA-seq) analysis workflow — from raw read alignment and expression matrix generation to downstream preprocessing steps.

  2. R-based Workflow: Seurat
    Introduction to the Seurat package for scRNA-seq data preprocessing and analysis, including:

    • Quality control and filtering
    • Identification of highly variable genes
    • Data normalization and scaling
    • Dimensionality reduction (PCA, UMAP, t-SNE)
    • Cell clustering and visualization
    • Annotation and dataset integration
  3. Python-based Workflow: Scanpy
    Overview of the Scanpy package as a Python counterpart to Seurat, supporting similar functions for data preprocessing, clustering, and visualization in a scalable, efficient framework.

💡 Note: This course will primarily focus on Python-based workflows for scRNA-seq data analysis.

Prerequisite: Creat the enviroment and install packages¶

  1. Create a new enviroment with python

    conda create --name dynamo python=3.11

  2. Activate the enviroment

    conda activate dynamo

  3. Install the following packages

    conda install -c conda-forge dynamo-release : to install latest version of dynamo

    or, for effeicient installation, using mamba,

    mamba install -c conda-forge dynamo-release

    mamba install scanpy : to install scanpy for data io

    pip install matplotlib==3.7.3 scipy==1.13.1 : to install dependicies of dynamo

    pip install gseapy : to install GO enrichment package

    If you are using jupyterlab, nb_conda_kernels will automatically register the kernel for you

    mamba install ipykernel

In [1]:
# Check python version
!python --version
Python 3.11.10
In [2]:
# Check data exists
# Data collected from https://journals.biologists.com/dev/article/146/12/dev173849/19483/Comprehensive-single-cell-mRNA-profiling-reveals-a
!ls -ll -h /home/zongxu/Misc/lesson/scRNA-application/data/endocrinogenesis_day15.h5ad
-rw-rw-r-- 1 zongxu zongxu 51M Oct 30  2024 /home/zongxu/Misc/lesson/scRNA-application/data/endocrinogenesis_day15.h5ad
In [3]:
# ! source dynamo/bin/activate # in cluster

Part 1: Import all packages¶

In [4]:
# Import core packages
import dynamo as dyn
import scanpy as sc

# Disable annoying warnings
import warnings
from numba import NumbaWarning

warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=NumbaWarning)
warnings.filterwarnings("ignore", category=UserWarning)

import os
os.chdir('/home/zongxu/Misc/lesson/scRNA-application/')

# Adjust visualization
# from IPython.core.display import display, HTML
# display(HTML("<style>.container { width:90% !important; }</style>"))

# Direct show figure in jupyter
%matplotlib inline
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/numba/np/ufunc/dufunc.py:343: NumbaWarning: Compilation requested for previously compiled argument types ((uint32,)). This has no effect and perhaps indicates a bug in the calling code (compiling a ufunc more than once for the same signature
  warnings.warn(msg, errors.NumbaWarning)
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/numba/np/ufunc/dufunc.py:343: NumbaWarning: Compilation requested for previously compiled argument types ((uint32,)). This has no effect and perhaps indicates a bug in the calling code (compiling a ufunc more than once for the same signature
  warnings.warn(msg, errors.NumbaWarning)
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/numba/np/ufunc/dufunc.py:343: NumbaWarning: Compilation requested for previously compiled argument types ((uint32,)). This has no effect and perhaps indicates a bug in the calling code (compiling a ufunc more than once for the same signature
  warnings.warn(msg, errors.NumbaWarning)
In [5]:
# check the working dir
os.getcwd()
Out[5]:
'/home/zongxu/Misc/lesson/scRNA-application'
In [6]:
# Check dynamo and all dependencies version
dyn.get_all_dependencies_version()
package umap-learn typing-extensions tqdm statsmodels setuptools session-info seaborn scipy scikit-learn pynndescent pre-commit pandas openpyxl numpy numdifftools numba networkx matplotlib loompy get-version dynamo-release colorcet anndata
version 0.5.6 4.12.2 4.66.6 0.14.4 75.1.0 1.0.0 0.13.2 1.13.1 1.5.2 0.5.13 4.0.1 2.2.3 3.1.5 1.23.5 0.9.41 0.60.0 3.4 3.7.3 3.0.6 3.5.5 1.3.2 3.1.0 0.10.9
In [7]:
# Set resolution/size, styling and format of figures
dyn.configuration.set_figure_params("dynamo", background="white", figsize=(3, 2))

Part 2: Data structure and IO¶

In [8]:
# usually using sc.read_10x_mtx or sc.read_10x_h5
data_path = './data/endocrinogenesis_day15.h5ad'
adata = sc.read_h5ad(data_path)
In [9]:
adata
Out[9]:
AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
    var: 'highly_variable_genes'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
    obsm: 'X_pca', 'X_umap'
    layers: 'spliced', 'unspliced'
    obsp: 'distances', 'connectivities'
Get to know anndata, basic data structure in python for single cell analysis¶
alt text
In [10]:
# X for sorting counts matrix, typically in Compressed Sparse Row format
adata.X
Out[10]:
<3696x27998 sparse matrix of type '<class 'numpy.float32'>'
	with 9298890 stored elements in Compressed Sparse Row format>
In [11]:
# see what exactly does X have
print(adata.X)
  (0, 18)	2.0
  (0, 25)	2.0
  (0, 32)	14.0
  (0, 35)	2.0
  (0, 54)	1.0
  (0, 76)	1.0
  (0, 97)	1.0
  (0, 104)	1.0
  (0, 109)	3.0
  (0, 135)	1.0
  (0, 154)	1.0
  (0, 156)	1.0
  (0, 158)	1.0
  (0, 171)	1.0
  (0, 198)	2.0
  (0, 216)	1.0
  (0, 247)	1.0
  (0, 257)	3.0
  (0, 258)	1.0
  (0, 264)	13.0
  (0, 265)	2.0
  (0, 274)	1.0
  (0, 277)	1.0
  (0, 278)	1.0
  (0, 325)	4.0
  :	:
  (3695, 27067)	1.0
  (3695, 27071)	1.0
  (3695, 27072)	1.0
  (3695, 27074)	1.0
  (3695, 27084)	1.0
  (3695, 27136)	1.0
  (3695, 27140)	1.0
  (3695, 27149)	3.0
  (3695, 27153)	21.0
  (3695, 27154)	11.0
  (3695, 27160)	1.0
  (3695, 27169)	1.0
  (3695, 27184)	4.0
  (3695, 27206)	1.0
  (3695, 27209)	2.0
  (3695, 27211)	1.0
  (3695, 27218)	1.0
  (3695, 27220)	1.0
  (3695, 27221)	1.0
  (3695, 27222)	3.0
  (3695, 27230)	1.0
  (3695, 27231)	2.0
  (3695, 27243)	2.0
  (3695, 27246)	8.0
  (3695, 27256)	2.0
In [12]:
# convert it to a numpy array
adata.X.toarray()
Out[12]:
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
In [13]:
adata.X.toarray().shape
Out[13]:
(3696, 27998)
In [14]:
# different layer sorted for different gene counts,
# which is used for calculate RNA velocity
# to call this for you data, please refer to velocyto
adata.layers['spliced'], adata.layers['unspliced']
Out[14]:
(<3696x27998 sparse matrix of type '<class 'numpy.float32'>'
 	with 9298890 stored elements in Compressed Sparse Row format>,
 <3696x27998 sparse matrix of type '<class 'numpy.float32'>'
 	with 3156504 stored elements in Compressed Sparse Row format>)
In [15]:
# adata.obs sorted cell meta data, include cluser and some specific score
adata.obs.head(3)
Out[15]:
clusters_coarse clusters S_score G2M_score
index
AAACCTGAGAGGGATA Pre-endocrine Pre-endocrine -0.224902 -0.252071
AAACCTGAGCCTTGAT Ductal Ductal -0.014707 -0.232610
AAACCTGAGGCAATTA Endocrine Alpha -0.171255 -0.286834
In [16]:
# adata.var sorted gene meta data
adata.var.head(3)
Out[16]:
highly_variable_genes
index
Xkr4 False
Gm37381 NaN
Rp1 NaN
In [17]:
# adata.obsm sorted some coordinated of cell
adata.obsm
Out[17]:
AxisArrays with keys: X_pca, X_umap

Part 3: Preprocess¶

In [18]:
# known differention related genes in pancreas
pancreas_genes = [
    "Hes1", "Nkx6-1", "Nkx2-2",
    "Neurog3", "Neurod1", "Pax4",
    "Pax6", "Arx", "Pdx1",
    "Ins1", "Ins2", "Ghrl",
    "Ptf1a", "Iapp", "Isl1",
    "Sox9", "Gcg",
]

figure_path = './figure/'
In [19]:
# do preprocessing, analog to scanpy
# select 4000 top variavle genes and filter genes that are detected in less than 20 counts (shared)
# ensure all pancrease genes are in the list
dyn.pp.recipe_monocle(adata, n_top_genes=4000,
                      fg_kwargs={"shared_count": 20},
                      genes_to_append=pancreas_genes)
|-----? dynamo.preprocessing.deprecated is deprecated.
|-----> recipe_monocle_keep_filtered_cells_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_filtered_cells_key=True
|-----> recipe_monocle_keep_filtered_genes_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_filtered_genes_key=True
|-----> recipe_monocle_keep_raw_layers_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_raw_layers_key=True
|-----> apply Monocole recipe to adata...
/tmp/ipykernel_43792/4080765733.py:4: DeprecationWarning: recipe_monocle is deprecated and will be removed in a future release. Please update your code to use the new replacement function.
  dyn.pp.recipe_monocle(adata, n_top_genes=4000,
|-----> ensure all cell and variable names unique.
|-----> ensure all data in different layers in csr sparse matrix format.
|-----> ensure all labeling data properly collapased
|-----> filtering cells...
|-----> 3696 cells passed basic filters.
|-----> filtering gene...
|-----> 6956 genes passed basic filters.
|-----> calculating size factor...
|-----> selecting genes in layer: X, sort method: SVR...
|-----> size factor normalizing the data, followed by log1p transformation.
|-----> Set <adata.X> to normalized data
|-----> applying PCA ...
|-----> <insert> X_pca to obsm in AnnData Object.
|-----> cell cycle scoring...
|-----> computing cell phase...
|-----> [Cell Phase Estimation] completed [31.4415s]
|-----> [Cell Cycle Scores Estimation] completed [0.4372s]
|-----> [recipe_monocle preprocess] completed [7.9795s]

Part 4: Calculate RNA velocity¶

In [20]:
# Setting the model into stochastic mode, for cells may not at steady state
dyn.tl.dynamics(adata, model="stochastic")
|-----> dynamics_del_2nd_moments_key is None. Using default value from DynamoAdataConfig: dynamics_del_2nd_moments_key=False
|-----------> removing existing M layers:[]...
|-----------> making adata smooth...
|-----> calculating first/second moments...
|-----> [moments calculation] completed [21.4898s]
estimating gamma: 100%|██████████| 4000/4000 [04:08<00:00, 16.10it/s]
Out[20]:
AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'Size_Factor', 'initial_cell_size', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'ntr', 'cell_cycle_phase'
    var: 'highly_variable_genes', 'nCells', 'nCounts', 'pass_basic_filter', 'log_cv', 'log_m', 'score', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics'
    obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores'
    layers: 'spliced', 'unspliced', 'X_unspliced', 'X_spliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S'
    obsp: 'distances', 'connectivities', 'moments_con'
In [21]:
# Dimension reduction, and then calculate UMAP basis
dyn.tl.reduceDimension(adata, n_pca_components=30)
|-----> retrieve data for non-linear dimension reduction...
|-----? adata already have basis umap. dimension reduction umap will be skipped! 
set enforce=True to re-performing dimension reduction.
|-----> [UMAP] completed [0.0040s]
In [22]:
# Project high dimensional velocity vectors onto
# UMAP low dimensional embeddings
dyn.tl.cell_velocities(adata, method="pearson", other_kernels_dict={"transform": "sqrt"})
# PCA low dimensional embeddings
dyn.tl.cell_velocities(adata, basis="pca")
|-----> [calculating transition matrix via pearson kernel with sqrt transform.] in progress: 100.0000%|-----> [calculating transition matrix via pearson kernel with sqrt transform.] completed [2.6657s]
|-----> [projecting velocity vector to low dimensional embedding] in progress: 100.0000%|-----> [projecting velocity vector to low dimensional embedding] completed [0.7203s]
Using existing pearson_transition_matrix found in .obsp.
|-----> [projecting velocity vector to low dimensional embedding] in progress: 100.0000%|-----> [projecting velocity vector to low dimensional embedding] completed [0.7704s]
Out[22]:
AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'Size_Factor', 'initial_cell_size', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'ntr', 'cell_cycle_phase'
    var: 'highly_variable_genes', 'nCells', 'nCounts', 'pass_basic_filter', 'log_cv', 'log_m', 'score', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics', 'use_for_transition'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics', 'grid_velocity_umap', 'grid_velocity_pca'
    obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores', 'velocity_umap', 'velocity_pca'
    layers: 'spliced', 'unspliced', 'X_unspliced', 'X_spliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S'
    obsp: 'distances', 'connectivities', 'moments_con', 'pearson_transition_matrix'
In [23]:
# Plot the velocity streamline
dyn.pl.streamline_plot(adata, 
                       color=["clusters"], 
                       basis="umap", 
                       show_legend="on data", 
                       show_arrowed_spines=True,
                       save_show_or_return='save',
                       save_kwargs={
                           "prefix": "01_velocity_streamline",  # prefix
                           "ext": "png",                        # file format
                           "path": figure_path,                 # save dir (default current working dir)
                           "dpi": 300,                          # resolution
                           "transparent": True,
                           "close": True,
                           "verbose": True
                       }
                      )
|-----------> plotting with basis key=X_umap
|-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type
Saving figure to ./figure/01_velocity_streamline_dyn_savefig.png...
Done
In [24]:
# Plot the velocity vector per cell
dyn.pl.cell_wise_vectors(
    adata,
    color=["clusters"],
    basis="umap",
    show_legend="on data",
    quiver_length=4,
    quiver_size=4,
    figsize=(5, 3),
    show_arrowed_spines=False,
    save_show_or_return='save',
    save_kwargs={
        "prefix": "02_cell_wise_velocity",  # prefix
        "ext": "png",                       # file format
        "path": figure_path,                # save dir (default current working dir)
        "dpi": 300,                         # resolution
        "transparent": True,
        "close": True,
        "verbose": True
    }
)
|-----> X shape: (3696, 2) V shape: (3696, 2)
|-----------> plotting with basis key=X_umap
|-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type
Saving figure to ./figure/02_cell_wise_velocity_dyn_savefig.png...
Done

Part 5: Learn the vector field¶

In [25]:
# Setting the progenitor cell types
progenitor = adata.obs_names[adata.obs.clusters.isin(["Ductal"])]
len(progenitor)
Out[25]:
916

Then, lets dive into some basic differential geometry

alt text
In [26]:
# Estimate the vector field
dyn.vf.VectorField(adata, basis="pca", pot_curl_div=True)
dyn.vf.VectorField(adata, basis="umap", pot_curl_div=True)

# Calculate the speed, divergence, accelaration and curl 
dyn.vf.speed(adata, basis="pca")
dyn.vf.divergence(adata, basis="pca")
dyn.vf.acceleration(adata, basis="pca")
dyn.vf.curl(adata, basis="umap")
|-----> VectorField reconstruction begins...
|-----> Retrieve X and V based on basis: PCA. 
        Vector field will be learned in the PCA space.
|-----> Learning vector field with method: sparsevfc.
|-----> [SparseVFC] begins...
|-----> Sampling control points based on data velocity magnitude...
|-----> [SparseVFC] completed [1.0299s]
|-----> Running ddhodge to estimate vector field based pseudotime in pca basis...
|-----> graphizing vectorfield...
|-----> incomplete neighbor graph info detected: connectivities and distances do not exist in adata.obsp, indices not in adata.uns.neighbors.
|-----> Neighbor graph is broken, recomputing....
|-----> Start computing neighbor graph...
|-----------> X_data is None, fetching or recomputing...
|-----> fetching X data from layer:None, basis:pca
|-----> method arg is None, choosing methods automatically...
|-----------> method ball_tree selected
|-----------? nbrs_idx argument is ignored and recomputed because nbrs_idx is not None and return_nbrs=True
|-----------> calculating neighbor indices...
|-----> [ddhodge completed] completed [12.1489s]
|-----> Computing divergence...
Calculating divergence: 100%|██████████| 4/4 [00:00<00:00, 14.69it/s]
|-----> [VectorField] completed [13.8660s]
|-----> VectorField reconstruction begins...
|-----> Retrieve X and V based on basis: UMAP. 
        Vector field will be learned in the UMAP space.
|-----> Generating high dimensional grids and convert into a row matrix.
|-----> Learning vector field with method: sparsevfc.
|-----> [SparseVFC] begins...
|-----> Sampling control points based on data velocity magnitude...
|-----> [SparseVFC] completed [3.8171s]
|-----> Running ddhodge to estimate vector field based pseudotime in umap basis...
|-----> graphizing vectorfield...
|-----------? nbrs_idx argument is ignored and recomputed because nbrs_idx is not None and return_nbrs=True
|-----------> calculating neighbor indices...
|-----> [ddhodge completed] completed [10.7601s]
|-----> Computing curl...
Calculating 2-D curl: 100%|██████████| 3696/3696 [00:00<00:00, 23880.32it/s]
|-----> Computing divergence...
Calculating divergence: 100%|██████████| 4/4 [00:00<00:00, 25.93it/s]
|-----> [VectorField] completed [15.2468s]
Calculating divergence: 100%|██████████| 4/4 [00:00<00:00, 12.84it/s]
|-----> [Calculating acceleration] in progress: 100.0000%|-----> [Calculating acceleration] completed [0.1335s]
Calculating 2-D curl: 100%|██████████| 3696/3696 [00:00<00:00, 23821.17it/s]
In [27]:
# Draw figures
dyn.configuration.set_figure_params("dynamo", background="black")

import matplotlib.pyplot as plt

fig1, fig1_axes = plt.subplots(ncols=2, nrows=2, constrained_layout=True, figsize=(10, 6))
dyn.pl.cell_wise_vectors(
    adata,
    color="pca_ddhodge_potential",
    pointsize=0.1,
    alpha=0.7,
    ax=fig1_axes[0, 0],
    quiver_length=6,
    quiver_size=6,
    save_show_or_return="return",
    background="black",
)
dyn.pl.grid_vectors(
    adata,
    color="speed_pca",
    basis="umap",
    ax=fig1_axes[0, 1],
    quiver_length=12,
    quiver_size=12,
    save_show_or_return="return",
    background="black",
)
dyn.pl.streamline_plot(
    adata, color="divergence_pca", basis="umap", ax=fig1_axes[1, 0], save_show_or_return="return", background="black"
)
dyn.pl.streamline_plot(
    adata, color="acceleration_pca", basis="umap", ax=fig1_axes[1, 1], save_show_or_return="return", background="black"
)

fig1.savefig('./figure/03_combined_plots.png')
plt.show()
|-----> X shape: (3696, 2) V shape: (3696, 2)
|-----------> plotting with basis key=X_umap
|-----------> plotting with basis key=X_umap
|-----------> plotting with basis key=X_umap
|-----------> plotting with basis key=X_umap

The acceleration and divergence accurately highlight hotspots, including a saddle point in ductal cells (negative divergence), exit from this state to early endocrine progenitors (high positive acceleration), the bifurcation point for late progenitors to differentiate into stable cell types (high acceleration and positive divergence), and stable cell types (negative divergence).

In [28]:
# Draw the potential cell states
dyn.pl.topography(
    adata, 
    basis="umap", 
    background="white", 
    color=["clusters"],
    streamline_color="black",
    show_legend="on data",
    save_show_or_return='save',
    save_kwargs={
        "prefix": "04_cell_state",  # prefix
        "ext": "png",                       # file format
        "path": figure_path,                       # save dir (default current working dir)
        "dpi": 300,                         # resolution
        "transparent": True,
        "close": True,
        "verbose": True
    }
)
|-----> Vector field for umap is but its topography is not mapped. Mapping topography now ...
|-----------> plotting with basis key=X_umap
|-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type
Saving figure to ./figure/04_cell_state_dyn_savefig.png...
Done
In [29]:
# Draw the state transition map
dyn.configuration.set_figure_params("dynamo", background="white")
dyn.pd.state_graph(adata, group='clusters', basis='pca', method='vf')
dyn.pl.state_graph(adata,
                   color=['clusters'],
                   group='clusters',
                   basis='umap',
                   show_legend='on data',
                   method='vf',
                   save_show_or_return='save',
                   save_kwargs={
                       "prefix": "05_cell_state_transition_map",  # prefix
                       "ext": "png",                       # file format
                       "path": figure_path,                       # save dir (default current working dir)
                       "dpi": 300,                         # resolution
                       "transparent": True,
                       "close": True,
                       "verbose": True
                   }
                  )
|-----> Estimating the transition probability between cell types...
|-----> Applying vector field
|-----> [KDTree parameter preparation computation] in progress: 0.0000%|-----> [KDTree computation] completed [0.0013s]
|-----> [iterate groups] in progress: 12.5000%
integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 22.21it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 128.36it/s]
|-----> [iterate groups] in progress: 25.0000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide
  grp_avg_time[i, :] /= grp_graph[i, :]
integration with ivp solver: 100%|██████████| 100/100 [00:05<00:00, 18.98it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 128.61it/s]
|-----> [iterate groups] in progress: 37.5000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide
  grp_avg_time[i, :] /= grp_graph[i, :]
integration with ivp solver: 100%|██████████| 100/100 [00:05<00:00, 18.94it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 124.88it/s]
|-----> [iterate groups] in progress: 50.0000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide
  grp_avg_time[i, :] /= grp_graph[i, :]
integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 22.68it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 134.18it/s]
|-----> [iterate groups] in progress: 62.5000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide
  grp_avg_time[i, :] /= grp_graph[i, :]
integration with ivp solver: 100%|██████████| 70/70 [00:03<00:00, 19.83it/s]
uniformly sampling points along a trajectory: 100%|██████████| 70/70 [00:00<00:00, 124.52it/s]
|-----> [iterate groups] in progress: 75.0000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide
  grp_avg_time[i, :] /= grp_graph[i, :]
integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 24.09it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 130.70it/s]
|-----> [iterate groups] in progress: 87.5000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide
  grp_avg_time[i, :] /= grp_graph[i, :]
integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 21.34it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 131.20it/s]
|-----> [iterate groups] in progress: 100.0000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide
  grp_avg_time[i, :] /= grp_graph[i, :]
integration with ivp solver: 100%|██████████| 100/100 [00:05<00:00, 18.43it/s]
uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 122.76it/s]
|-----> [iterate groups] completed [55.1253s]
|-----> [State graph estimation] completed [0.0020s]
|-----------> plotting with basis key=X_umap
|-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide
  grp_avg_time[i, :] /= grp_graph[i, :]
Saving figure to ./figure/05_cell_state_transition_map_dyn_savefig.png...
Done

Part 6: Reveal the cell state transition regulator via differential geometry¶

alt text
In [30]:
# Calculate the jacobian on our predefined genes
dyn.vf.jacobian(adata, regulators=pancreas_genes)
Transforming subset Jacobian: 100%|██████████| 3696/3696 [00:00<00:00, 44017.06it/s]
Out[30]:
AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'Size_Factor', 'initial_cell_size', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'ntr', 'cell_cycle_phase', 'pca_ddhodge_div', 'pca_ddhodge_potential', 'divergence_pca', 'control_point_pca', 'inlier_prob_pca', 'obs_vf_angle_pca', 'umap_ddhodge_div', 'umap_ddhodge_potential', 'curl_umap', 'divergence_umap', 'control_point_umap', 'inlier_prob_umap', 'obs_vf_angle_umap', 'speed_pca', 'acceleration_pca', 'jacobian_det_pca'
    var: 'highly_variable_genes', 'nCells', 'nCounts', 'pass_basic_filter', 'log_cv', 'log_m', 'score', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics', 'use_for_transition'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics', 'grid_velocity_umap', 'grid_velocity_pca', 'VecFld_pca', 'VecFld_umap', 'clusters_graph', 'jacobian_pca'
    obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores', 'velocity_umap', 'velocity_pca', 'velocity_pca_SparseVFC', 'X_pca_SparseVFC', 'velocity_umap_SparseVFC', 'X_umap_SparseVFC', 'acceleration_pca'
    layers: 'spliced', 'unspliced', 'X_unspliced', 'X_spliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S', 'acceleration'
    obsp: 'distances', 'connectivities', 'moments_con', 'pearson_transition_matrix', 'pca_ddhodge', 'umap_ddhodge'
In [31]:
# Ngn3 activates Pax4 in progenitors, initiating pancreatic endocrinogenesis
dyn.pl.jacobian(
    adata,
    basis="umap",
    regulators=[
        "Neurog3",
    ],
    effectors=["Pax4"],
    alpha=1,
    save_show_or_return='save',
    save_kwargs={
        "prefix": "06_reg_Ngn3_activate_Pax4",  # prefix
        "ext": "png",                       # file format
        "path": figure_path,                       # save dir (default current working dir)
        "dpi": 300,                         # resolution
        "transparent": True,
        "close": True,
        "verbose": True
    }
)
Saving figure to ./figure/06_reg_Ngn3_activate_Pax4_dyn_savefig.png...
Done
In [32]:
# Jacobian analyses reveal mutual inhibition of Pax4 and Arx at the bifurcation point in progenitors.
dyn.pl.jacobian(adata, basis="umap", regulators=["Arx"], effectors=["Pax4"], alpha=1)
dyn.pl.jacobian(adata, basis="umap", regulators=["Pax4"], effectors=["Arx"], alpha=1)
<Figure size 600x400 with 0 Axes>
In [33]:
# Pdx1 activates Ins2 in beta cells.
dyn.pl.jacobian(adata, basis="umap", regulators=["Pax4"], effectors=["Ins2"], alpha=1)

Part 7: Find core pathway on cell type transition¶

In [34]:
adata.obs["clusters"].unique()
Out[34]:
['Pre-endocrine', 'Ductal', 'Alpha', 'Ngn3 high EP', 'Delta', 'Beta', 'Ngn3 low EP', 'Epsilon']
Categories (8, object): ['Ductal', 'Ngn3 low EP', 'Ngn3 high EP', 'Pre-endocrine', 'Beta', 'Alpha', 'Delta', 'Epsilon']
In [35]:
# Rank the elements in the Jacobian
In [36]:
dyn.vf.acceleration(adata, basis='pca')
dyn.vf.rank_acceleration_genes(adata, groups='clusters')
|-----> [Calculating acceleration] in progress: 100.0000%|-----> [Calculating acceleration] completed [0.0734s]
Out[36]:
AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'Size_Factor', 'initial_cell_size', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'ntr', 'cell_cycle_phase', 'pca_ddhodge_div', 'pca_ddhodge_potential', 'divergence_pca', 'control_point_pca', 'inlier_prob_pca', 'obs_vf_angle_pca', 'umap_ddhodge_div', 'umap_ddhodge_potential', 'curl_umap', 'divergence_umap', 'control_point_umap', 'inlier_prob_umap', 'obs_vf_angle_umap', 'speed_pca', 'acceleration_pca', 'jacobian_det_pca'
    var: 'highly_variable_genes', 'nCells', 'nCounts', 'pass_basic_filter', 'log_cv', 'log_m', 'score', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics', 'use_for_transition'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics', 'grid_velocity_umap', 'grid_velocity_pca', 'VecFld_pca', 'VecFld_umap', 'clusters_graph', 'jacobian_pca', 'rank_acceleration', 'rank_abs_acceleration'
    obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores', 'velocity_umap', 'velocity_pca', 'velocity_pca_SparseVFC', 'X_pca_SparseVFC', 'velocity_umap_SparseVFC', 'X_umap_SparseVFC', 'acceleration_pca'
    layers: 'spliced', 'unspliced', 'X_unspliced', 'X_spliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S', 'acceleration'
    obsp: 'distances', 'connectivities', 'moments_con', 'pearson_transition_matrix', 'pca_ddhodge', 'umap_ddhodge'
In [37]:
adata.uns['rank_acceleration'][:5]
Out[37]:
Alpha Beta Delta Ductal Epsilon Ngn3 high EP Ngn3 low EP Pre-endocrine
0 Ghrl Ins1 Cck Hmgb2 Tmsb4x Chga Neurog3 Tmsb4x
1 Cck Ins2 Ins1 Ccnb2 Gcg Chgb Tmsb4x Pyy
2 Mdk Xist Cdkn1a Cdc20 Nkx6-1 Tm4sf4 Btbd17 Iapp
3 Rbp4 Krt8 Mdk Hist1h2bc Rps19 Akr1c19 Mdk Calr
4 Serpina1c Gpx3 Krt7 Dynll1 Tmsb10 Cryba2 Btg2 Ppp1r14a
In [38]:
# Enrichment analysis allows us to see if top ranking genes are significantly enriched in
# certain biological pathways. For example, when we pick the top 100 accelerating genes
# in Ductal cells, we found that they are highly enriched in cell cycle related pathways.
In [39]:
!pwd
/home/zongxu/Misc/lesson/scRNA-application
In [40]:
enr = dyn.ext.enrichr(
    adata.uns['rank_acceleration']['Ductal'][:100].to_list(),
    organism='mouse',
    gene_sets='GO_Biological_Process_2021',
    outdir='./figure'
)
enr.results.head(3)
Out[40]:
Gene_set Term Overlap P-value Adjusted P-value Old P-value Old Adjusted P-value Odds Ratio Combined Score Genes
0 GO_Biological_Process_2021 mitotic spindle organization (GO:0007052) 20/157 7.135193e-23 6.507296e-20 0 0 36.063869 1839.055984 NDE1;BUB1B;CDCA8;KIF23;DYNLL1;RANGAP1;PMF1;AUR...
1 GO_Biological_Process_2021 microtubule cytoskeleton organization involved... 17/128 7.210195e-20 3.287849e-17 0 0 36.515033 1609.444116 NDE1;BUB1B;CDCA8;DYNLL1;RANGAP1;PMF1;AURKA;CDC...
2 GO_Biological_Process_2021 mitotic sister chromatid segregation (GO:0000070) 13/102 3.535785e-15 1.074879e-12 0 0 33.261397 1106.800974 CDCA8;KIF23;KIF22;SMC4;SMC2;CENPE;CCNB1;KIFC1;...
In [41]:
from gseapy.plot import barplot, dotplot
barplot(enr.res2d, column = 'Adjusted P-value',title='GO_Biological_Process_2018', cutoff=0.05)
dotplot(enr.res2d, title='KEGG_2016',cmap='viridis_r', cutoff=0.05)
plt.show()
In [45]:
import sys
import pkg_resources

print("### Session Info ###\n")

loaded_pkgs = []

for mod_name, module in sys.modules.items():
    file_path = getattr(module, "__file__", None)
    if (
        file_path
        and "site-packages" in file_path
        and not mod_name.startswith("_")
        and "." not in mod_name
    ):
        try:
            version = pkg_resources.get_distribution(mod_name).version
            loaded_pkgs.append((mod_name, version))
        except Exception:
            pass

for name, version in sorted(set(loaded_pkgs)):
    print(f"{name:<15} v{version}")
### Session Info ###

IPython         v8.29.0
anndata         v0.10.9
asttokens       v2.4.1
certifi         v2024.8.30
cffi            v1.17.1
charset_normalizer v3.4.0
colorama        v0.4.6
colorcet        v3.1.0
comm            v0.2.2
cycler          v0.12.1
debugpy         v1.8.7
decorator       v5.1.1
executing       v2.1.0
fontTools       v4.54.1
gseapy          v1.1.3
h5py            v3.12.1
idna            v3.10
igraph          v0.11.6
ipykernel       v6.29.5
jedi            v0.19.1
joblib          v1.4.2
jupyter_client  v8.6.3
jupyter_core    v5.7.2
kiwisolver      v1.4.7
legacy_api_wrap v1.4
llvmlite        v0.43.0
louvain         v0.8.0
matplotlib      v3.7.3
matplotlib_inline v0.1.7
more_itertools  v10.5.0
natsort         v8.4.0
networkx        v3.4
numba           v0.60.0
numdifftools    v0.9.41
numpy           v1.23.5
packaging       v24.1
pandas          v2.2.3
parso           v0.8.4
patsy           v0.5.6
pexpect         v4.9.0
pickleshare     v0.7.5
platformdirs    v4.3.6
prompt_toolkit  v3.0.48
psutil          v6.1.0
ptyprocess      v0.7.0
pure_eval       v0.2.3
pycparser       v2.22
pygments        v2.18.0
pynndescent     v0.5.13
pyparsing       v3.2.0
pytz            v2024.1
requests        v2.32.3
scanpy          v1.10.3
scipy           v1.13.1
seaborn         v0.13.2
six             v1.16.0
stack_data      v0.6.2
statsmodels     v0.14.4
texttable       v1.7.0
threadpoolctl   v3.5.0
tornado         v6.4.1
tqdm            v4.66.6
traitlets       v5.14.3
typing_extensions v4.12.2
urllib3         v2.2.3
wcwidth         v0.2.13
In [ ]: